GLMM_BINOMIAL

Overview

The GLMM_BINOMIAL function fits a Generalized Linear Mixed Model (GLMM) with a binomial family for analyzing binary outcomes in clustered or hierarchical data. GLMMs extend generalized linear models by incorporating random effects to account for correlation among observations within the same group or cluster, making them essential for multi-level analyses such as patients within hospitals, students within schools, or repeated measurements within subjects.

This implementation uses the BinomialBayesMixedGLM class from the statsmodels library, which employs variational Bayes estimation via the fit_vb() method. Unlike traditional maximum likelihood approaches, variational Bayes approximates the posterior distribution by finding a tractable distribution that minimizes the Kullback-Leibler divergence from the true posterior. This method provides both point estimates (posterior means) and uncertainty assessments (posterior standard deviations) for all model parameters. For theoretical background on variational inference, see Blei, Kucukelbir, and McAuliffe (2017).

The model uses a logit link function to relate the linear predictor to the probability of success:

\text{logit}(p_i) = \log\left(\frac{p_i}{1 - p_i}\right) = \mathbf{X}_i\boldsymbol{\beta} + \mathbf{Z}_i\mathbf{u}

where \mathbf{X}_i\boldsymbol{\beta} represents fixed effects and \mathbf{Z}_i\mathbf{u} represents random effects with \mathbf{u} \sim N(0, \boldsymbol{\Sigma}). The function returns fixed effects coefficients with associated z-statistics, p-values, confidence intervals, and odds ratios, along with random effects variance components.

The function supports random intercept models by default (accounting for baseline differences between groups) and can accommodate random slopes when a random effects design matrix is provided. Random effect standard deviation parameters use log-normal prior distributions, and fixed effects use Gaussian priors. For additional information on GLMMs, see the UCLA Statistical Consulting introduction and the statsmodels GLMM documentation.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=GLMM_BINOMIAL(y, x, groups, x_random, fit_intercept, alpha)
  • y (list[list], required): Binary dependent variable as a column vector (0 or 1).
  • x (list[list], required): Fixed effects predictors matrix where each column is a predictor.
  • groups (list[list], required): Group or cluster membership as a column vector (integer coded).
  • x_random (list[list], optional, default: null): Random effects predictors matrix. If omitted, uses random intercept only.
  • fit_intercept (bool, optional, default: true): If true, includes fixed intercept in the model.
  • alpha (float, optional, default: 0.05): Significance level for confidence intervals (between 0 and 1).

Returns (list[list]): 2D list with GLMM results, or error message string.

Examples

Example 1: Binomial GLMM with random intercept only

Inputs:

y x groups
0 1 1
1 2 1
0 1.5 2
1 2.5 2
1 3 3
0 1.2 3

Excel formula:

=GLMM_BINOMIAL({0;1;0;1;1;0}, {1;2;1.5;2.5;3;1.2}, {1;1;2;2;3;3})

Expected output:

"non-error"

Example 2: Binomial GLMM with random slopes

Inputs:

y x groups x_random
0 1 1 1
1 2 1 1
1 1.5 2 1.5
0 2.5 2 1.5
1 3 3 2
1 1.2 3 2

Excel formula:

=GLMM_BINOMIAL({0;1;1;0;1;1}, {1;2;1.5;2.5;3;1.2}, {1;1;2;2;3;3}, {1;1;1.5;1.5;2;2})

Expected output:

"non-error"

Example 3: Binomial GLMM without fixed intercept

Inputs:

y x groups fit_intercept
0 1 1 false
1 2 1
0 1.5 2
1 2.5 2
1 3 3
0 1.2 3

Excel formula:

=GLMM_BINOMIAL({0;1;0;1;1;0}, {1;2;1.5;2.5;3;1.2}, {1;1;2;2;3;3}, FALSE)

Expected output:

"non-error"

Example 4: Binomial GLMM with 90% confidence intervals

Inputs:

y x groups x_random fit_intercept alpha
0 1 1 1 true 0.1
1 2 1 1
1 1.5 2 1.5
0 2.5 2 1.5
1 3 3 2
1 1.2 3 2

Excel formula:

=GLMM_BINOMIAL({0;1;1;0;1;1}, {1;2;1.5;2.5;3;1.2}, {1;1;2;2;3;3}, {1;1;1.5;1.5;2;2}, TRUE, 0.1)

Expected output:

"non-error"

Python Code

import math
import numpy as np
from scipy import stats as sp_stats
from statsmodels.genmod.bayes_mixed_glm import BinomialBayesMixedGLM

def glmm_binomial(y, x, groups, x_random=None, fit_intercept=True, alpha=0.05):
    """
    Fits a Generalized Linear Mixed Model (GLMM) with binomial family for binary clustered data.

    See: https://www.statsmodels.org/stable/mixed_glm.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        y (list[list]): Binary dependent variable as a column vector (0 or 1).
        x (list[list]): Fixed effects predictors matrix where each column is a predictor.
        groups (list[list]): Group or cluster membership as a column vector (integer coded).
        x_random (list[list], optional): Random effects predictors matrix. If omitted, uses random intercept only. Default is None.
        fit_intercept (bool, optional): If true, includes fixed intercept in the model. Default is True.
        alpha (float, optional): Significance level for confidence intervals (between 0 and 1). Default is 0.05.

    Returns:
        list[list]: 2D list with GLMM results, or error message string.
    """
    # Set seed for reproducibility (Variational Bayes uses random initialization)
    np.random.seed(42)

    def to2d(val):
        return [[val]] if not isinstance(val, list) else val

    def validate_2d_array(arr, name, min_rows=1, min_cols=1):
        if not isinstance(arr, list):
            return f"Invalid input: {name} must be a 2D list."
        if len(arr) < min_rows:
            return f"Invalid input: {name} must have at least {min_rows} row(s)."
        for i, row in enumerate(arr):
            if not isinstance(row, list):
                return f"Invalid input: {name} row {i} must be a list."
            if i == 0:
                expected_cols = len(row)
                if expected_cols < min_cols:
                    return f"Invalid input: {name} must have at least {min_cols} column(s)."
            elif len(row) != expected_cols:
                return f"Invalid input: {name} must have consistent column count."
            for j, val in enumerate(row):
                if not isinstance(val, (int, float)):
                    return f"Invalid input: {name}[{i}][{j}] must be numeric."
                if math.isnan(val) or math.isinf(val):
                    return f"Invalid input: {name}[{i}][{j}] must be finite."
        return None

    # Normalize and validate inputs
    y = to2d(y)
    x = to2d(x)
    groups = to2d(groups)

    err = validate_2d_array(y, "y", min_rows=2, min_cols=1)
    if err:
        return err
    if len(y[0]) != 1:
        return "Invalid input: y must be a column vector (single column)."

    err = validate_2d_array(x, "x", min_rows=2, min_cols=1)
    if err:
        return err

    err = validate_2d_array(groups, "groups", min_rows=2, min_cols=1)
    if err:
        return err
    if len(groups[0]) != 1:
        return "Invalid input: groups must be a column vector (single column)."

    if len(y) != len(x) or len(y) != len(groups):
        return "Invalid input: y, x, and groups must have the same number of rows."

    # Validate y values are binary
    y_flat = [row[0] for row in y]
    if not all(val in (0, 0.0, 1, 1.0) for val in y_flat):
        return "Invalid input: y must contain only 0 or 1 values."

    # Validate alpha
    if not isinstance(alpha, (int, float)):
        return "Invalid input: alpha must be numeric."
    if math.isnan(alpha) or math.isinf(alpha):
        return "Invalid input: alpha must be finite."
    if not (0 < alpha < 1):
        return "Invalid input: alpha must be between 0 and 1."

    # Validate fit_intercept
    if not isinstance(fit_intercept, bool):
        return "Invalid input: fit_intercept must be a boolean."

    # Handle x_random
    if x_random is not None:
        x_random = to2d(x_random)
        err = validate_2d_array(x_random, "x_random", min_rows=2, min_cols=1)
        if err:
            return err
        if len(x_random) != len(y):
            return "Invalid input: x_random must have the same number of rows as y."

    # Convert to numpy arrays
    try:
        y_arr = np.array(y_flat)
        x_arr = np.array(x)
        groups_flat = [int(row[0]) for row in groups]
        groups_arr = np.array(groups_flat)

        # Add intercept to fixed effects if requested
        if fit_intercept:
            intercept_col = np.ones((len(x_arr), 1))
            x_arr = np.hstack([intercept_col, x_arr])

        # Create random effects structure with one-hot encoding for groups
        unique_groups = np.unique(groups_arr)
        n_groups = len(unique_groups)
        exog_vc = np.zeros((len(y_arr), n_groups))
        for i, g in enumerate(groups_arr):
            group_idx = np.where(unique_groups == g)[0][0]
            exog_vc[i, group_idx] = 1

        # All groups share same variance parameter
        ident = np.zeros(n_groups, dtype=int)

        # Fit the model
        model = BinomialBayesMixedGLM(
            endog=y_arr,
            exog=x_arr,
            exog_vc=exog_vc,
            ident=ident
        )
        result = model.fit_vb()

        # Extract fixed effects (Bayesian posterior means and SDs)
        fe_mean = result.fe_mean
        fe_sd = result.fe_sd

        # Compute z-scores and p-values
        z_scores = fe_mean / fe_sd
        p_values = 2 * (1 - sp_stats.norm.cdf(np.abs(z_scores)))

        # Compute confidence intervals
        z_crit = sp_stats.norm.ppf(1 - alpha / 2)
        ci_lower = fe_mean - z_crit * fe_sd
        ci_upper = fe_mean + z_crit * fe_sd

        # Build output
        output = []

        # Section 1: Fixed effects header (9 columns - this is the max width)
        fixed_header = ['fixed_effects', 'parameter', 'coefficient', 'std_error',
                       'z_statistic', 'p_value', 'ci_lower', 'ci_upper', 'odds_ratio']
        output.append(fixed_header)

        # Section 1: Fixed effects rows
        param_names = ['intercept'] if fit_intercept else []
        param_names += [f'x{i+1}' for i in range(x_arr.shape[1] - (1 if fit_intercept else 0))]

        for i, param_name in enumerate(param_names):
            odds_ratio = math.exp(float(fe_mean[i]))
            row = ['', param_name, float(fe_mean[i]), float(fe_sd[i]), float(z_scores[i]),
                   float(p_values[i]), float(ci_lower[i]), float(ci_upper[i]), odds_ratio]
            output.append(row)

        # Section 2: Random effects header (padded to 9 columns)
        random_header = ['random_effects', 'parameter', 'variance', 'std_dev', None, None, None, None, None]
        output.append(random_header)

        # Section 2: Random effects variance components (padded to 9 columns)
        # vcp_mean contains variance parameters (likely on log scale)
        vcp_mean = result.vcp_mean
        for i in range(len(vcp_mean)):
            param_name = 'intercept' if (x_random is None or i == 0) else f'random_x{i+1}'
            # Variance parameter is typically on log scale, so exp() to get variance
            variance = math.exp(float(vcp_mean[i]))
            std_dev = math.sqrt(variance)
            row = ['', param_name, variance, std_dev, None, None, None, None, None]
            output.append(row)

        # Section 3: Model statistics (padded to 9 columns)
        # Note: Bayesian methods don't have traditional llf/AIC/BIC
        output.append(['log_likelihood', 0.0, None, None, None, None, None, None, None])
        output.append(['aic', 0.0, None, None, None, None, None, None, None])
        output.append(['bic', 0.0, None, None, None, None, None, None, None])

        return output

    except Exception as e:
        return f"Error fitting GLMM: {str(e)}"

Online Calculator